Create an interactive Parallel Plot¶
To demonstrate the use of the interactive parallel plot, we use a project already loaded into the CKG database.
[1]:
import pandas as pd
from ckg.report_manager import project, dataset, report
from ckg.analytics_core.viz import viz as plots
import networkx as nx
from networkx.readwrite import json_graph
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go
from scipy.stats import zscore
init_notebook_mode(connected=True)
%matplotlib inline
import ipywidgets as widgets
from ipywidgets import interact, interact_manual
c:\users\sande\.conda\envs\pip_rev\lib\site-packages\outdated\utils.py:18: OutdatedPackageWarning:
The package pingouin is out of date. Your version is 0.3.11, the latest is 0.3.12.
Set the environment variable OUTDATED_IGNORE=1 to disable these warnings.
WGCNA functions will not work. Module Rpy2 not installed.
R functions will not work. Module Rpy2 not installed.
We create a new project object and load the respective data and report¶
[2]:
my_project = project.Project(identifier='P0000001', datasets={}, report={})
my_project.load_project_data()
my_project.load_project_report()
We can now access to all the results for each data type¶
[3]:
my_project.list_datasets()
[3]:
dict_keys(['clinical', 'multiomics', 'proteomics'])
We will use the results from the proteomics analyses. We access the dataset ‘proteomics’ for further analysis¶
[4]:
proteomics_dataset = my_project.get_dataset('proteomics')
The available analysis for this dataset are:¶
[5]:
my_project.get_dataset('proteomics').list_dataframes()
[5]:
['complex_associations',
'correlation_correlation',
'disease_associations',
'drug_associations',
'go annotation',
'go_enrichment_Biological_processes_regulation_enrichment',
'interaction_network',
'literature_associations_publications_abstracts',
'number of modified proteins',
'number of peptides',
'number of proteins',
'original',
'overview statistics_summary',
'pathway annotation',
'pathway_enrichment_Pathways_regulation_enrichment',
'processed',
'protein biomarkers',
'regulated',
'regulation table',
'tissue qcmarkers']
We can access the different dataframes like this:¶
[6]:
my_project.get_dataset('proteomics').get_dataframe('go annotation')
[6]:
annotation | group | identifier | source | |
---|---|---|---|---|
0 | mitochondrial genome maintenance | None | TYMP~P19971 | UniProt |
1 | maltose metabolic process | None | MGAM~O43451 | UniProt |
2 | maltose metabolic process | None | GAA~P10253 | UniProt |
3 | ribosomal large subunit assembly | None | RPL11~P62913 | UniProt |
4 | ribosomal large subunit assembly | None | RPL6~Q02878 | UniProt |
5 | ribosomal large subunit assembly | None | RPL3~P39023 | UniProt |
6 | ribosomal large subunit assembly | None | RPLP0~P05388 | UniProt |
7 | ribosomal small subunit assembly | None | RPS28~P62857 | UniProt |
8 | ribosomal small subunit assembly | None | RPS5~P46782 | UniProt |
9 | ribosomal small subunit assembly | None | RPS14~P62263 | UniProt |
10 | ribosomal small subunit assembly | None | RPS19~P39019 | UniProt |
11 | ribosomal small subunit assembly | None | RPS27~P42677 | UniProt |
12 | very long-chain fatty acid metabolic process | None | ACAA1~P09110 | UniProt |
13 | autophagosome assembly | None | RAB1A~P62820 | UniProt |
14 | autophagosome assembly | None | NSFL1C~Q9UNZ2 | UniProt |
15 | autophagosome assembly | None | UBQLN1~Q9UMX0 | UniProt |
16 | autophagosome assembly | None | RAB7A~P51149 | UniProt |
17 | urea cycle | None | ASS1~P00966 | UniProt |
18 | urea cycle | None | CPS1~P31327 | UniProt |
19 | urea cycle | None | OTC~P00480 | UniProt |
20 | urea cycle | None | ARG1~P05089 | UniProt |
21 | urea cycle | None | ASL~P04424 | UniProt |
22 | citrulline metabolic process | None | ASS1~P00966 | UniProt |
23 | argininosuccinate metabolic process | None | ASS1~P00966 | UniProt |
24 | ribosomal subunit export from nucleus | None | RAN~P62826 | UniProt |
25 | ribosomal subunit export from nucleus | None | EIF6~P56537 | UniProt |
26 | ribosomal large subunit export from nucleus | None | RAN~P62826 | UniProt |
27 | ribosomal large subunit export from nucleus | None | NPM1~P06748 | UniProt |
28 | ribosomal small subunit export from nucleus | None | NPM1~P06748 | UniProt |
29 | ribosomal small subunit export from nucleus | None | RAN~P62826 | UniProt |
... | ... | ... | ... | ... |
17753 | negative regulation of extrinsic apoptotic sig... | None | SCG2~P13521 | UniProt |
17754 | negative regulation of extrinsic apoptotic sig... | None | GSTP1~P09211 | UniProt |
17755 | negative regulation of extrinsic apoptotic sig... | None | LMNA~P02545 | UniProt |
17756 | negative regulation of extrinsic apoptotic sig... | background | THBS1~P07996 | UniProt |
17757 | positive regulation of extrinsic apoptotic sig... | None | PTPRC~P08575 | UniProt |
17758 | positive regulation of extrinsic apoptotic sig... | background | AGT~P01019 | UniProt |
17759 | positive regulation of extrinsic apoptotic sig... | None | BID~P55957 | UniProt |
17760 | positive regulation of extrinsic apoptotic sig... | background | PDIA3~P30101 | UniProt |
17761 | positive regulation of extrinsic apoptotic sig... | None | PAK2~Q13177 | UniProt |
17762 | positive regulation of extrinsic apoptotic sig... | None | PYCARD~Q9ULZ3 | UniProt |
17763 | regulation of extrinsic apoptotic signaling pa... | None | FGFR1~P11362 | UniProt |
17764 | negative regulation of extrinsic apoptotic sig... | background | PRDX2~P32119 | UniProt |
17765 | negative regulation of extrinsic apoptotic sig... | None | COL2A1~P02458 | UniProt |
17766 | positive regulation of extrinsic apoptotic sig... | None | PPP1CA~P62136 | UniProt |
17767 | regulation of intrinsic apoptotic signaling pa... | None | PYCARD~Q9ULZ3 | UniProt |
17768 | negative regulation of intrinsic apoptotic sig... | None | DDX3X~O00571 | UniProt |
17769 | positive regulation of intrinsic apoptotic sig... | background | S100A8~P05109 | UniProt |
17770 | positive regulation of intrinsic apoptotic sig... | None | BID~P55957 | UniProt |
17771 | positive regulation of intrinsic apoptotic sig... | None | SLC9A3R1~O14745 | UniProt |
17772 | positive regulation of intrinsic apoptotic sig... | background | S100A9~P06702 | UniProt |
17773 | regulation of phosphatidylcholine biosynthetic... | None | FABP3~P05413 | UniProt |
17774 | regulation of store-operated calcium entry | None | CD84~Q9UIB8 | UniProt |
17775 | regulation of store-operated calcium entry | None | STC2~O76061 | UniProt |
17776 | regulation of store-operated calcium entry | None | STIM1~Q13586 | UniProt |
17777 | positive regulation of cation channel activity | None | CTSS~P25774 | UniProt |
17778 | regulation of semaphorin-plexin signaling pathway | background | NCAM1~P13591 | UniProt |
17779 | negative regulation of cysteine-type endopepti... | None | PARK7~Q99497 | UniProt |
17780 | positive regulation of cysteine-type endopepti... | background | GSN~P06396 | UniProt |
17781 | positive regulation of cysteine-type endopepti... | None | FAS~P25445 | UniProt |
17782 | negative regulation of cysteine-type endopepti... | None | PAK2~Q13177 | UniProt |
17783 rows × 4 columns
In this case, we will use the the processed dataframe with transformed and imputed LFQ intensities. We then normalize the data using Z Score.¶
[7]:
proteomics_dataset = my_project.get_dataset('proteomics')
processed_df = proteomics_dataset.get_dataframe('processed')
[8]:
processed_df.head()
[8]:
A2M~P01023 | A30~A2MYE2 | ABI3BP~Q7Z7G0 | ACE~P12821 | ACTB~P60709 | ACTN1~P12814 | ADA2~Q9NZK5 | ADAMTS13~Q76LX8 | ADAMTSL4~Q6UY14 | ADH4~P08319 | ... | VIM~P08670 | VK3~A2N2F4 | VNN1~O95497 | VTN~P04004 | VWF~P04275 | YWHAZ~P63104 | group | sample | scFv~Q65ZC9 | subject | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 38.005564 | 28.173504 | 21.588427 | 22.213865 | 27.090330 | 25.039968 | 23.442151 | 24.010605 | 25.085820 | 23.389032 | ... | 24.178889 | 25.835908 | 22.480055 | 32.815815 | 28.922779 | 19.246215 | Cirrhosis | AS1181 | 27.788928 | S368 |
1 | 37.309118 | 27.981907 | 27.342062 | 23.847270 | 27.461155 | 25.896268 | 23.754503 | 24.135818 | 19.241174 | 22.148706 | ... | 23.709777 | 25.004889 | 23.852908 | 32.722121 | 29.881279 | 22.141285 | Cirrhosis | AS1182 | 26.869972 | S369 |
2 | 37.384952 | 28.857627 | 20.156993 | 22.863630 | 27.929764 | 24.295225 | 23.359443 | 24.121788 | 24.923476 | 23.017163 | ... | 23.599064 | 26.271650 | 24.232132 | 32.755752 | 29.444625 | 18.901149 | Cirrhosis | AS1184 | 28.069328 | S371 |
3 | 38.417225 | 28.978380 | 25.501910 | 22.992774 | 27.152479 | 25.231288 | 23.701340 | 24.568309 | 24.878802 | 26.388112 | ... | 24.179076 | 25.929200 | 24.269047 | 32.714014 | 29.397176 | 22.216971 | Cirrhosis | AS1185 | 28.170209 | S372 |
4 | 37.471303 | 28.748744 | 20.658038 | 21.949025 | 27.537048 | 22.392992 | 22.406264 | 24.961173 | 22.246468 | 24.339540 | ... | 23.865224 | 26.701340 | 20.490667 | 32.722691 | 28.540895 | 20.797497 | Cirrhosis | AS1186 | 28.612280 | S373 |
5 rows × 517 columns
[9]:
processed_df = processed_df.drop(['sample', 'subject'], axis=1).set_index('group').apply(zscore).reset_index()
In order to find clusters of proteins, we access the report and the protein-protein correlation network as a dictionary.¶
[10]:
proteomics_report = my_project.get_dataset('proteomics').report
proteomics_report.list_plots()
[10]:
dict_keys(['0_date', '0~proteomics_pipeline~cytoscape_network', '10~regulation_description~description', '11~regulation_anova~basicTable', '12~regulation_anova~volcanoplot', '13~correlation_correlation~network', '14~interaction_network~network', '15~complex_associations~basicTable', '16~drug_associations~basicTable', '17~disease_associations~basicTable', '18~literature_associations_publications_abstracts~basicTable', '19~literature_associations_publications_abstracts~wordcloud', '1~overview statistics_summary~multiTable', '20~go_enrichment_Biological_processes_regulation_enrichment~basicTable', '21~pathway_enrichment_Pathways_regulation_enrichment~basicTable', '2~proteins~barplot', '3~proteins~basicTable', '4~coefficient_variation_coefficient_of_variation~scatterplot_matrix', '5~quality_control_qcmarkers~qcmarkers_boxplot', '6~ranking_ranking_with_markers~ranking', '7~ranking_ranking_with_markers~basicTable', '8~stratification_description~description', '9~stratification_pca~pca'])
[14]:
correlation_net_dict = proteomics_report.get_plot('13~correlation_correlation~network')[0]
To convert the dictionary into a network, we access the json version within the dictionary and convert it using the networkX package.¶
[15]:
correlation_net = json_graph.node_link_graph(correlation_net_dict['net_json'])
Now that we have a network with proteins colored by cluster, we can convert this information into a dataframe to be used in this Jupyter Notebook.¶
[16]:
correlation_df = pd.DataFrame.from_dict(correlation_net.nodes(data=True))
correlation_df = correlation_df[0].to_frame().join(correlation_df[1].apply(pd.Series))
[17]:
correlation_df.columns = ['identifier', 'degree', 'radius', 'color', 'cluster']
Since the correlation network was generated using cut-off , not all the proteins in the processed dataframe are part of a cluster, therefore we filter the processed dataframe and keep only the proteins that are present in the correlation clusters.¶
[18]:
min_val = processed_df._get_numeric_data().min().min().round()
max_val = processed_df._get_numeric_data().max().max().round()
processed_df = processed_df[list(correlation_df.identifier) + ['group']]
Ready! To build the parallel plot, we create a dictionary with the clusters and respectives colors, and filter the processed dataframe to include only the proteins in a specific cluster.¶
Using the Jupyter Widgets interact function, we can make the plot interactive and allow the visualization of a cluster selected by the user.
[19]:
from IPython.core.display import display, HTML
[20]:
@interact
def plot_parallel_plot(cluster=correlation_df.cluster.unique()):
cluster_colors = dict(zip(correlation_df.cluster, correlation_df.color))
clusters = correlation_df.groupby('cluster')
identifiers = clusters.get_group(cluster)['identifier'].tolist()
title= "Parallel plot cluster: {}".format(cluster)
df = processed_df.set_index('group')[identifiers].reset_index()
figure = plots.get_parallel_plot(df, identifier=cluster, args={'color':cluster_colors[cluster],'group':'group',
'title':title,
'zscore':False})
display(HTML("<p>{}</p>".format(",".join(identifiers))))
iplot(figure.figure)
[ ]: